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Existing software analysis tools use the semantics of the programming language to 
check our codes: Are variables declared and initialized? Do variable types match? 
Where do memory leaks and memory errors occur? However, the meaning or 
semantics that a code developer builds into his/her code extends far beyond 
programming language semantics. Scientific code developers use variables to 
represent physical and mathematical quantities (mass, derivative), expressions of 
quantities to represent physical formulae (Navier-Stokes equation), loops to apply these 
formulae in a domain, and conditional expressions to control execution. These 
semantic details are crucial when developers and users try to understand and check 
their scientific and engineering codes; further, their analysis is manual, time-consuming, 
and error-prone. 

This paper reports progress in an experiment to automatically recognize and check 
these physical and mathematical semantics. The experimental procedure combines 
semantic declarations with a pattern recognition capability; the code (1) 

C? MA == mass, ACC == acceleration (1 ) 

FF = MA * ACC 

contains two semantic declarations for MA and ACC, and with Newton's law among the 
recognizable patterns, the procedure recognizes this code as force assigned to FF. 
These formula patterns are represented in and recognized by parsers . The 
conclusions of this procedure are displayed for the user as shown in Figure 1 . A more 
detailed explanation of this procedure and its extensions is given in Reference 2. 

This experiment’s objective is to understand the limits of this automatic recognition 
procedure: Does it apply to a wide range of scientific and engineering codes? Can it 
reduce the time, risk, and effort required to develop and modify scientific code? 

Previous work 2 demonstrated that scientific concepts and formulae could be 
represented and recognized. In fact, for part of one reacting flow code (Figure 2), 50% 
of the operations can be recognized. However, this preliminary work posed several 
more questions: Can additional semantic details be represented and recognized? How 
well do the recognition rules work in blind test cases? What are the limitations of this 

procedure? 
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File Dictionary Metrics Highlight Language About W ^ 

A c 

c determine inlet static temperature from isentropic relations 

tsrat - gaslsn(emach1, 2, gam) 
tsln « tsrartOin 

C? TSIN — TEMPERATUREABSOLUTE 
atsin(i) - tsin - dftodr 
c 

c »»deteng^ ^ ^^^ ^^^ Bio c ity. viscosity, Reynolds number 

uin - emach1*sqrt(gam*rgas“tsin) 
arhoul(i) - rhoirVuin 
auin(i) = uin 

vlsin - visrer(tsin/tvisrf)**vlspwr 
recxl - rholn*uirTchordx/visin 
arecxl (I) - recxl 


J 


c determine Inlet thermal conductivity and Prandti number 

conln - conrer(tsin/tconrf)**conpwr 
prndll - visin*cepe/(conln ,,, 777.6^) 

/ « 


* Quantity: DENSITY 
v. Location: UNKNOWN 
v Dimensions: length~-3 mass'll 
Units: sJugs/ft3 
Accuracy: 


Metascope 


Undefined 


Microscope 


Error 


Back 


Not Understood 


Fwd 


Performance 


Deduced from equation: 


DENSITY - PRESSURE /WORK PUM 


Expertise: GAS DYNAMICS 
FiJe: flow lniet.f 


Undefined: 35 


Errors: 0 


Not Understood: 7 


A 


The mass of a region of space divided by its volume. 

I'DERIV’ 

The discrete derivative of one variable with respect to another (ratio of two DELTAS). 

This symbol takes two adjectives: the function (numerator) and the variable (denominator). 

[‘DERIV2' 

The discrete second derivative of one variable with respect to two others. 

This symbol takes three adjectives: the function (numerator) and the first and second 


Figure 1: GUI display for the semantic analysis program. The top window displays 
a user’s code; variables and expressions may be selected for explanation. The 
middle region explains this selected text. In this case, the physical quantity is 
density, it does not have a grid location, and it has the displayed dimensions, units, 
and derivation. The bottom region displays the semantic dictionary/lexicon. 


To answer these questions, the procedure’s representation and recognition of semantic 
details has been significantly extended, including expert parsers for vector analysis, 
object analysis (the object of the formula), array reference/assignment analysis. Also, 
existing expert parsers have been refined and extended. A measure of the expert 
parsers is given in Table 1 . Table 2 samples the rules represented in these parsers. 
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Subroutine Understanding .vs. Semantic Declarations 



Figure 2: Graph showing the increase in expression understanding as semantic 
declarations are added to twenty subroutines from the ALLSPD code. The 
subroutines contain 5278 non-comment FORTRAN statements and 3431 operations 
to understand. Further work will increase the understanding fraction. The analysis 
results reflect the analysis code’s quality and not the quality or ability of the ALLSPD 
code. 
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Quantity-Math" 

5 

772 

72 

Quantity-Physical 

3 

766 

114 

Value / Interval 

2 

223 

27 

Grid Location 

4 

1801 

235 

Geometrical Entity 

i 

447 

20 

Vector Entity 

1 

300 

15 

Non-Dimensional 

1 

72 

5 

Dimensions 

i 

59 

10 

Units 

1 

71 

14 

Object Analysis 

i 

128 

10 

Array Analysis 

2 

121 

3 


Table 1 : Aspect analyses performed by the semantic analysis procedure including 
number of parsers for each aspect, number of Yacc 1 parser rules, and fundamental 
equations. Equation (1) corresponds to a fundamental equation; some equations 
require several parser rules. 
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Table 2: A sampling of expert parser rules used in the semantic analysis method. 
Many rules are condensed. Due to decomposition a single operation may involve 
multiple independent aspects (units, grid location and quantity for x_coordinate - 
x_coordinate), and several rules from this table can apply to it. 


To understand the procedure’s generality, that is, if the rules and recognition capability 
can apply to a range of codes, the procedure's performance was tested on large blind 
test cases. Semantic declarations for solution variables and coordinates were included 
in the ADPAC code (a 3D Navier-Stokes, curvilinear coordinate, turbomachinery code 
with 86k lines of code (loc)) and the ENG 10 code (an axisymmetric, curvilinear 
coordinate, engine simulation code with 20k loc). The fraction of operations recognized 
is shown in Figure 3. These baseline results provide some initial evidence of generality, 
however, how these measurements improve as the procedure develops further is most 
important. 
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Expression Understanding .vs. Semantic Declarations 



Figure 3: Graph showing the increase in expression understanding as semantic 
declarations are added to two blind test cases. The ADPAC codes contain 86k loc, 
and the ENG 10 code contains 20k loc. Further work will increase the under- 
standing fraction. The analysis results reflect the analysis code’s quality and not the 
quality or abilities of the ADPAC or ENG10 codes. 

Assessing the future of this procedure is problematic, however experience indicates 
that three issues will determine success. First, the large number of formulae used in 
scientific codes — even within a field — makes it difficult, but not a priori impossible, to 
capture the knowledge necessary for recognition. Second, although one rule 
application or inference is necessary to recognize equation (1), and the formula sqrt 
(u x ^ + u y 2 + u 2 2 ) involves six inferences, 0(1 0 2 ) inferences are often required as 
expressions are evaluated and combined. Needing many inferences to find a result 
magnifies the risk of failure since an unknown inference, a limitation of this procedure, 
or a coding error will terminate the inference chain and leave the result unidentified. 
Hence, success of this method depends on good coverage of the domain knowledge, a 
robust semantic analysis procedure, and stable procedure coding. Third, repre- 
sentation of semantic details has not been a major problem, however continued 
success in representing knowledge is important. 

Future work will pursue two questions. First, can formulae be added to the expert 
parsers so that the knowledge domain is sufficiently covered for good recognition of 
general codes? Second, can the procedure be perfected to a useful scientific software 
tool? The best way to answer these questions is to develop the procedure further while 
testing it on more codes. 
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